{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 《Python数据挖掘方法及应用》PyDm\n",
    "### 【第2章 数据挖掘的分析基础】数据与练习2 \n",
    "#### **（请在#下面问题的空白处写出代码并输出结果）**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "2.1 调查数据。某公司对财务部门人员是否抽烟进行调查，结果为：\n",
    "\n",
    "    否，否，否，是，是，否，否，是，否，是，否，否，是，是，否，是，否，否，是，是。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 请用value_count函数统计人数，并绘制条图，按颜色区分是否。\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "2.2 医学数据：对一组50人的饮酒者所饮酒类进行调查，把饮酒者按红酒(1)、白酒(2)、黄酒(3)、啤酒(4)分成四类。调查数据如下：\n",
    "\n",
    "    3,4,1,1,3,4,3,3,1,3,2, 1,2,1,3,4,1,1,3,4,3,3,1,3,2,1,2,1,2,3,2,3,1,1,1,1,4,3,1,2,3,2,3,1,1,1,1,4,3,1。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#(1)请用value_count()函数统计饮酒人数，用pie()函数绘制饼图，并按颜色和文字区分酒的类型\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#(2)请用value_count()函数构建自己的计数频数表函数\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#(3)请自定义一个计数数据的频数表生成函数和频数图绘制函数\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "2.3 工资数据。上述企业财务部员工的月工资数据如下：\n",
    "\n",
    "    2050，2100，2200，2300，2350，2450，2500，2700，2900， 2850，3500，3800，2600，3000，3300，3200，4000，3100，4200，3500。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#（1）试用mean、median、var、sd函数求数据的均值、中位数、方差、标准差。\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#（2）绘制该数据的散点图和直方图，应用hist函数构建自己的计量频数表函数。\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# (3)请自定义一个计量数据的频数表生成函数和频数图绘制函数。\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "2.4 经理年薪。收集某沿海发达城市2015年66个年薪超过10万元的公司经理的收入（单位：万元）为\n",
    "\n",
    "    11，19，14，22，14，28，13，81，12，43，11，16，31，16，23，42，22，26，17，22， 13，27，108，16，\n",
    "    43，82，14，11，51，76，28，66，29，14，14，65，37，16，37，35，39，27，14，17，13，38，28，40，\n",
    "    85，32，25，26，16，120，54，40，18，27，16，14，33，29，77，50，19，34。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#（1）可以对这些薪酬的分布状况作何分析？\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#（2）试通过编写计算基本统计量的函数来分析数据的集中趋势和离散程度。\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#（3）试分析为何该数据的均值和中位数差别如此之大，方差、标准差在此有何作用？如何正确分析该数据的集中趋势和离散程度？\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#（4）绘制该数据的散点图和直方图。\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#（5）请用自定义函数生成频数表和频数图。\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "2.5 economics数据集给出了美国经济增长变化的数据。该数据是数据框格式，由478行和6个变量组成，变量如下。\n",
    "\n",
    "    date：日期，单位为月份；psavert：个人存款率；pce：个人消费支出，单位为十亿美元；\n",
    "    uemploy：失业人数，单位为千人；unempmed：失业时间中位数，单位为周；pop：人口数，单位为千人。\n",
    "    请用matplotlib，seaborn和ggplot三种绘图系统绘制：    "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>date</th>\n",
       "      <th>pce</th>\n",
       "      <th>pop</th>\n",
       "      <th>psavert</th>\n",
       "      <th>uempmed</th>\n",
       "      <th>unemploy</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>1</td>\n",
       "      <td>1967-06-30</td>\n",
       "      <td>507.8</td>\n",
       "      <td>198712</td>\n",
       "      <td>9.8</td>\n",
       "      <td>4.5</td>\n",
       "      <td>2944</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>2</td>\n",
       "      <td>1967-07-31</td>\n",
       "      <td>510.9</td>\n",
       "      <td>198911</td>\n",
       "      <td>9.8</td>\n",
       "      <td>4.7</td>\n",
       "      <td>2945</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>3</td>\n",
       "      <td>1967-08-31</td>\n",
       "      <td>516.7</td>\n",
       "      <td>199113</td>\n",
       "      <td>9.0</td>\n",
       "      <td>4.6</td>\n",
       "      <td>2958</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>4</td>\n",
       "      <td>1967-09-30</td>\n",
       "      <td>513.3</td>\n",
       "      <td>199311</td>\n",
       "      <td>9.8</td>\n",
       "      <td>4.9</td>\n",
       "      <td>3143</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>5</td>\n",
       "      <td>1967-10-31</td>\n",
       "      <td>518.5</td>\n",
       "      <td>199498</td>\n",
       "      <td>9.7</td>\n",
       "      <td>4.7</td>\n",
       "      <td>3066</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         date    pce     pop  psavert  uempmed  unemploy\n",
       "1  1967-06-30  507.8  198712      9.8      4.5      2944\n",
       "2  1967-07-31  510.9  198911      9.8      4.7      2945\n",
       "3  1967-08-31  516.7  199113      9.0      4.6      2958\n",
       "4  1967-09-30  513.3  199311      9.8      4.9      3143\n",
       "5  1967-10-31  518.5  199498      9.7      4.7      3066"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#!pip install PyDataset           #安装PyDataset包 \n",
    "from pydataset import data        #加载PyDataset包 \n",
    "economics = data('economics')     #调用pydataset包中的数据框economics\n",
    "economics.head()                  #显示前5行数据"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#（1）以date为横坐标，unemploy/pop为纵坐标画折线图。\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#（2）以date为横坐标，unempmed为纵坐标画折线图。\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.3"
  },
  "toc": {
   "base_numbering": 1,
   "nav_menu": {},
   "number_sections": true,
   "sideBar": true,
   "skip_h1_title": false,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": false,
   "toc_position": {},
   "toc_section_display": true,
   "toc_window_display": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
